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Abstract 

Copy number variation (CNV) has been recognized as a major contributor to human genome diversity. It plays an important 
role in determining phenotypes and has been associated with a number of common and complex diseases. However CNV 
data from diverse populations is still limited. Here we report the first investigation of CNV in the indigenous populations 
from Peninsular Malaysia. We genotyped 34 Negrito genomes from Peninsular Malaysia using the Affymetrix SNP 6.0 
microarray and identified 48 putative novel CNVs, consisting of 24 gains and 24 losses, of which 5 were identified in at least 
2 unrelated samples. These CNVs appear unique to the Negrito population and were absent in the DGV, HapMap3 and 
Singapore Genome Variation Project (SGVP) datasets. Analysis of gene ontology revealed that genes within these CNVs were 
enriched in the immune system (GO:0002376), response to stimulus mechanisms (GO:0050896), the metabolic pathways 
(GO:0001852), as well as regulation of transcription (GO:0006355). Copy number gains in CNV regions (CNVRs) enriched with 
genes were significantly higher than the losses (P value <0.001). In view of the small population size, relative isolation and 
semi-nomadic lifestyles of this community, we speculate that these CNVs may be attributed to recent local adaptation of 
Negritos from Peninsular Malaysia. 
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autosomal microsatellite markers and mitochondrial DNA suggest 
that these tribes are genetically similar and may have experienced 
high levels of genetic drift [8,10,1 1]. It is believed that they may 
have adapted to the environmental changes throughout the 
centuries to cope with limited food resources and the tropical 
rainforest environment. Currendy, the number of Negritos is 
dwindling rapidly as Malaysia becomes more developed and 
forests are cleared. Characterizing the genetic variation of the 
isolated populations such as Negritos provides valuable informa- 
tion to the gene mapping of complex diseases [12]. Thus it is 
crucial to unveil their genetic makeup in order to better 
understand how genetic variation contributes to the well-being 
and health of human populations especially in the Southeast Asian 
region. 

Copy number variations (CNV) typically range from 1 kb to 
several megabases in size [13] and are acknowledged as a major 
contributor to genetic diversity. This variability plays an important 
role to determine phenotypes such as physical features and 



Introduction 

Southeast Asia is believed to be one of the earliest regions of 
Homo genus habitation recorded outside Africa. This may have 
occurred nearly 2 million years ago, following the arrival of the 
ancient Javanian, known as Homo erectus [1]. The Negrito people 
are believed to be direct descendants of humans who arrived in 
Peninsular Malaysia more than 60,000 years ago [2-4]. Ancestral 
Homo sapiens who originated from Africa [5] migrated into Asia 
along the coastal route [6]. The Negritos from Peninsular 
Malaysia are of Austroasiatic origin [7] and thought to be related 
to the Philippine Aeta and Andaman Islanders as well as the 
Melanesians, Tasmanians, and certain tropical Australian rain- 
forest foragers based on superficial anatomical features and 
foraging lifestyles [8]. In Malaysia, Negritos are divided into six 
tribes based on linguistics, socio-cultural practices, and geograph- 
ical region inhabited namely, Bateq, Mendriq, Jehai, Kensiu, 
Lanoh and Kintak, numbering approximately 0.15% of the total 
population [9]. Studies on various genetic markers including 
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Table 1. Candidate genes primer sequences and copy number amplified in SyBr Green qPCR assay. 





Locus name 


CNV size 
spanned (bp) 


Primer sequence 


Expected amplicon 
size (bp) 


Annealing temp (°C) 


Copy number 


ADH7 


153,593 


Forward: gaaggcacaagctgctgttat 
Reverse: catcctgtctttgtcttggatct 


99 


59.6°C 


3 

(2.80, 0.108) 


CSMD1 


301,535 


Forward: actctgaacggtgtcctggttt Reverse: 
ttcctaagctgcaaaggtgtg 


92 


62.2°C 


3 

(3.1 1, 0.062) 


SH2D4B 


1 5,464 


Forward: atgttctatgctgtggtggatg Reverse: 
acgaactttgtcagaaacgtga 


101 


59.9°C 


1 

(0.45, 0.042) 


NPAS3 


25,484 


Forward: ctgttggcttagaggctgagat 
Reverse: agcccttgagatgattcctaca 


109 


60°C 


1 

(1 .32, 0.65) 


WDR4 


165,544 


Forward: acaggtttgtgagccgtatctc 
Reverse: tcaagaatccagaggtgagtga 


106 


60C 


2 

(2.10, 0.14) 


LRRC30 


9,547 


Forward: cttgcacgtgggctcgaatc 
Reverse: ggatgttgttgccctctgcg 


95 


66.3C 




TNFRSF1B 


83,214 


Forward: cattaggagatgtgtggtcctg 
Reverse: aacagtatgtcccgttctgtctc 


90 


59.6°C 


3 

(3.09, 0.008) 


PRIMER 1 


36,283 


Forward: acagaacctaagcggaaatcct 
Reverse: aactggaagcaagatgctgact 


107 


64.0°C 


3 

(3.40, 0.08) 


PRIMER 2 


65,481 


Forward: ccctgaagcgtgagtctctaat Reverse: 
tgataacacctctgcacattcc 


89 


63.5'C 


3 

(2.50, 0.12) 


PRIMER 3 


42,399 


Forward: ggtcttcagtttgtgcttcagat Reverse: 
catcacttcctagcgccttc 


80 


63.4°C 


3 

(2.90, 0.07) 


PRIMER 4 


63,260 


Forward: tcctaaagtttccgcaggag 
Reverse: ctcacttcactggtgtcaggtt 


99 


63.2'C 


1 

(1.14, 0.32) 


QCNV2 


9,812 


Forward: caggcaagttcatatgttcca 
Reverse: agaggaatgccagatagagcag 


113 


63.6°C 


3 

(2.90, 0.11) 


QCNV4 


4,021 


Forward: acttggtaaattgtgttga 
Reverse: tgtcagtcctgcattt 


104 


52.4'C 


2 

(2.20, 0.17) 



WDR4 and QCNV4 showed copy number normal and therefore considered as false positive. QCNV2 was detected as a CN gain by microarray, inconsistent with the qPCR 
validation, therefore considered as false positive. Parentheses, unrounded copy number values calculated using the relative quantification, standard deviation. 
doi:1 0.1 371 /journal.pone.01 00371 .t001 



conferring susceptibility to a number of common and complex 
diseases including HIV, psoriasis, and a number of neuropsychi- 
atric diseases [14-17]. This occurs via potentially altering gene 
expression levels and influencing the gene dosage [18,19]. They 
account for a significant proportion of the genome [13,20], are 
highly variable, and often harbor regions with genes sensitive to 
the environmental stimulation such as those involved in immunity, 
metabolism, olfactory receptors [21-23]. Due to their non-random 
distribution across the genome, it is believed this phenomenon 
may have trended towards selection bias [21,24]. 

Most genetic diversity data in indigenous populations have been 
based on single nucleotide polymorphisms (SNP)/single nucleotide 
variations (SNV) [6,25,26] and maternal lineage mitochondrial 
DNA [4,8], except for a handful of studies [27-33]. To date, 
CNVs in indigenous populations of Peninsular Malaysia have not 
been reported. As a complement to the existing SNP data, we 
explored the first CNV map of Negrito individuals from 
Peninsular Malaysia and report the distribution of novel and 
population-specific CNVs. Our findings may be able to provide 
fundamental insights to the genetic architecture of the Negritos 
which can be translated to aid biomedical and evolutionary 
investigations. 

Materials and Methods 

Sample Recruitment 

This study was reviewed and approved by the Research and 
Ethics Committee of Universiti Teknologi MARA [Ref no: 600- 



RMI (5/1/6)], and Department of Orang Asli Development 
(Jabatan Kemajuan Orang Asli Malaysia, JAKOA) [Ref no: 
JHEOA.PP.30.052.Jld 5(17)]. Prior to sample coUection, the 
headman of the tribe and/ or the community members were first 
consulted in a customary courtesy visit and their consent were 
obtained. During sampling, all participants were interviewed, and 
informed and written consent were obtained. Process of interview 
and informed written consent was conducted in Malay language 
and witnessed by the officer from JAKOA. Only Negrito 
participants 18 years who gave consent were selected. We 
collected 10 ml of peripheral blood from 34 unrelated individuals 
(17 males and 17 females) after obtaining informed consent. The 
samples consisted of both males and females from sub-tribes Jahai, 
Bateq, Mendriq and Kensiu. DNA was extracted from using 
Qiagen Blood Extraction Kit (Qiagen, Hilden, Germany) accord- 
ing to the manufacturer's protocol. 

Microarray Genotyping 

Genotyping was performed using the Affymetrix SNP6.0 Array 
platform according to the manufacturer's instructions. Briefly, 
250 ng of genomic DNA was digested and ligated. The ligated 
products were then PCR amplified. Amplicons were electropho- 
resed, purified and quantified to ensure that the samples passed 
quality control (QC) measures before further experiment. The 
products were then fragmented, hybridized onto the Affymetrix 
SNP6.0 chips and stained. Chips were scanned and raw data was 
generated using Affymetrix Genotyping Console Software (GTC) 
version 3.0.2 with default settings. 
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Figure 1. CNVR map of Negrito samples. The ideogram summarizes the distribution of CNVRs on each human chromosome. The red indicates 
copy number loss, the blue indicates copy number gain while the green indicates multi-allelic loci. 
doi:1 0.1 371 /journal.pone.01 00371 .g001 
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Figure 2. Length distribution of the CNVs in Negrito from 
Peninsular Malaysia. 

doi:1 0.1 371 /journal.pone.01 00371 .g002 



Copy Number Variation Analysis and Validation 

CNVs were called independently using three algorithms, 
Affymetrix GTC, Birdsuite and iPattern (TCAG) as described 
previously [35]. We applied stringent filtering criteria such that 
CNV had to be a minimum of 1 kb and span 5 consecutive 
probes, and be detected by at least 2 out of the 3 algorithms. In 
addition we excluded CNVs that were on the X and Y- 
chromosomes, or approximately 300 kb adjacent to the centro- 
meres and telomeres. To define a set of rare CNVs we excluded 
known polymorphic loci (ie. Copy number polymorphism, CNP, 
targeted by the array) and those CNVs with more than 50% 
reciprocal overlap with those reported in DGV. 

The filtered CNV calls were then compared with the HapMap3 
dataset and subsequently the Singapore Genome Variation Project 
(SGVP) (http://www.statgen.nus.edu.sg/~SGVP/), to further 
identify CNVs unique to the Peninsular Malaysia Negritos. We 
defined a CNV as putative novel and unique to Negritos (denoted 
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Table 2. General characteristics of CNV and CNVR among 34 Negrito genomes from Peninsular Malaysia. 







GTC 


Birdsuite 


iPattern 


Merged* 


Total CNV count: 


Gain 


530 


735 


1,430 


330 


Loss 


803 


1,901 


2,262 


781 


Complex CNV 






40 




Total 


1,333 


2,636 


3,692 


1,111 


Average number per genome: 


Gain 


15.5 


21.6 


42.0 


9.7 


Loss 


23.6 


55.9 


66.5 


23.0 


Total 


39.2 


77.5 


108.6 


32.7 


Size (bp): 


Min 


1,000 


1,019 


1,010 


1,134 


Max 


1,768,000 


985,807 


1,033,784 


1,033,785 



*Merged: stringent CNV calls by at least 2 out of 3 algorithms applied. 
doi:1 0.1 371 /journal.pone.01 00371 .t002 



as population-specific CNVs) when it is not present in any of the 
HapMap3 and SGVP samples (defined as <50% reciprocal 
overlap with HapMap3 and SGVP CNVs). 

Annotated CNVs unique to the Negrito samples studied with 
underlying genes were validated with qPCR SyBr Green assay as 
previously described [34]. A total of 50 ng (10 ng/u.1) genomic 
DNA was amplified in a reaction mixture containing 12.5 ul iQ, 
Sybr Green Supermix (Biorad), 1 JJ.1 (7 |J,M/|ll) of respective 
forward and reverse primers, and top up to total volume of 25 u.1 
with ddH 2 0. Cycling conditions were 95°C for 3 min, and then 
40 cycles of 95 °C for 30 s, followed by respective annealing 
temperatures of each locus for 15 s and 72°C for 30 s. 

Melting curve was performed to check for specificity of the 
assay. Efficiency of the assay was observed by the generation of 
standard curve by created a serial of five-fold dilutions of a top 
standard of 50 ng/ul to 0.08 ng/u.1 (10 ng to 0.016 ng) of a single 
genomic DNA sample. AH reactions were run in triplicate, except 
a few when the genomic DNA was insufficient, were run in 
duplicate. Normalization to the control gene Forkhead Box P2 
(FOXP2) (primers 5'-TGACATGCCAGCTTATCTGTTT-3' 
and 5'-GAGAAAAGCAATTTTCACAGTCC-3') was used to 
give an estimate of copy number. The reproducibility of the qRT- 
PCR assay for each sample was calculated by estimating the 
within-sample variation measured through the coefficient of 
variation (C.V. % = 100*[standard deviation] /mean). Copy 



number of the target sequence in each test sample is determined 
by using comparative CT (2-AACT). 

Eight out of 12 (66.7%) CNVs were true positive (8 out of 9 
were CNVs >10 kb in length). However, all 3 CNV less than 
10 kb failed to validate. Considering the low replication rate, we 
removed the CNVs sized <10kb from further analysis. The 
primer sequences and the copy number amplified for the 
candidate CNVs is listed in Table 1. 

The microarray dataset has been submitted to NCBI dbGaP. 
The accession number assigned is: phs000664.vl.pl. 

Gene Ontology Analysis 

We submitted the annotated genes list underlying the Negrito- 
specific CNVs observed to PANTHER (Protein ANalysis 
THrough Evolutionary Relationships) (http://www.pantherdb. 
org/) and DAVID (the Database for Annotation, Visualization 
and Integrated Discovery, version 6.7) (http://david.abcc.ncifcrf. 
gov/summary.jsp). 

Results 

General Characteristics of CNV and CNVR 

We identified 1,333 autosomal CNVs from Genotyping Console 
(Affymetrix), with an average 39.2 CNVs per genome, whilst the 
total number of CNVs being called by Birdsuite and iPattern were 
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Figure 3. UCSC Genome Browser view of CNV on chromosome 3p22.2. Figure produced by custom tracks listing CNV call of Negrito and 
uploaded to http://genome.ucsc.edu. 
doi:1 0.1 371 /journal.pone.01 00371 .g003 
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Figure 4. Length distribution of the CNVs unique to the 
Negrito from Peninsular Malaysia. 
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2,636 and 3,692, respectively (mean number of calls per genome 
77.5 and 108.6 respectively, Table 2). After applying stringent 
filtering criteria, 1,111 overlapping CNVs were successfully 
merged, with an average 32.7 CNVs per genome (CNV call per 
genome ranged from 19-54), corresponding 105,909,572 bp of 
the total autosomal genome (Figure 1). These corresponded to 263 
CNVRs comprising of 1 6 1 losses, 94 gains and 8 multi- allelic sites. 
Figure 2 shows the length distribution of CNVRs in this study. 

Comparison of Common CNVs 

We first compared the diversity of common CNVs with the 
HapMap3 populations derived from 10 populations (consisting 
1,072 samples). A set of CNVs that showed significant differences 
of allele frequencies are listed in Table 3. Notably, CNV losses at 
chromosome 3p22.2 (37,957,108-37,961,932) were observed in 
56% of the Negrito samples in this study as compared to the rest of 
the HapMap3 populations (Table 3; Figure 3). The gene CTDSPL 
involved in this CNV was found to be associated with prostate 
cancer (https://www.genome.gov/26525384). The CNV in chro- 
mosome 15ql3.3 was another region of interest. Frequency of this 
CNV was found to be higher (0.44) as compared to the HapMap3 
samples (ranging from 0.09-0.2 1). The gene CHRNA7 involved in 
this CNV was found to be associated with schizophrenia and 
epilepsy [35,36]. 

Population Specific CNVs 

Our dataset was further compared with HapMap3 dataset. 
Analysis revealed 62 CNVs (corresponded to 36 CNVRs) unique 
to our Negrito samples. However, due to the high false discovery, 
the CNVs sized <10kb were excluded from further analysis, 
hence 48 CNVs remained (24 gains; 24 losses), of which 32 were 
singletons (Table 4). Length distribution of the CNVRs specific to 
Negritos is shown in Figure 4. 

To confirm the uniqueness of these CNVs in Negritos, we 
further compared our dataset with the metropolitan Chinese, 
Indians and Malays from SGVP. Seven CNVRs were covered in 
SGVP but none of these putative CNVs we found had been 
previously reported. 

Gene Ontology and Pathway Analyses 

To understand the putative functional implications of these 
CNVs, we performed the Gene Ontology (GO) and pathway 
analyses on the gene set within the Negrito-specific CNVs using 
PANTHER and DAVID (Figure 5). Of the 48 CNVs specific to 
Negritos, 29 carried annotated genes while the remaining were 
gene-poor regions (Table 4). For all the CNVRs enriched with 



PLOS ONE 



| www.plosone.org 



5 



June 2014 | Volume 9 | Issue 6 | e100371 



Population Specific CNV of Negritos 



u 3 



.2 



OJ 

E 
o 
c 

<D 
CD 



> 

z 
u 



OJ 
Q. 



Q. 

O 



IS 



J2 
O 

u 
(11 
E 
o 
« 
o 
E 
o 



Q 

i 

i 

< 



Q 

LO 

u 



< 

X 



X 
Q 
< 



O 



,— ,— rs 



< m 
_i 

ro 2 

cr ^ 

cr O 



Oi ^ J; 



ct> cn cn rjt — 



Dl - _ _ - 



rorovoororororororororororororo 



rorororororororororo 



0> vO vO 



\0 o v£> 



ld ro ro 



rs Ov i— 



k >J3 Oi 



vO CO ro 0> 



ro t— 



vO ro CTv 



^ Oi \D 



m m ro vO 



vD <— 



vO CO CO CO CO 00 



Oi ro CO 01 CO Ov >— 



O rs ov «- 
oT rs" i-^ 



vO vO vO 



rvT— rs CO <— Oi <— i— ■— i— 



<3\ <J\ CO 

o\ oS" ^ 

vD vO <— 



pj © ch co in o> 
i- ^* of m of \o 



t— m ro 



(N (S r 



rs CO CO ro CTi 



CO CT> Ov CO 



in co m vo 



rs co i— 



ro m o> vo 



Ch C7> M3 



^ o vD O vO 



m 
ro 


| 

ro 


LO 

VO 
ro 


,114 




LA 


ro 

rv 


vo" 
*t 
r». 


ro" 
Oi 

CO 


374, 


CM 


o 
ro 
r-. 


o 


00 
fN 


vO 

CO 


rs 
00 


fN 
ro 


o 

Ov 



i— ro rs 



rs rs rs 



ro <— <— <— 



vO m rs ro 
rs rs rs rs 



rs rs ro ro 



ro i— i— 



CO d Oi Oi 



rs i— ro <— 



PLOS ONE I www.plosone.org 



6 



June 2014 | Volume 9 | Issue 6 | e100371 



Population Specific CNV of Negritos 



M N N 



genes, copy number gains were significantly higher than the losses 
(15 gain versus 6 losses) (P<0.001). GO analysis by PANTHER 
revealed fourteen genes involved in immune system function and 
regulation, response to stimulus and metabolic pathways; whereas 
DAVID revealed that transcription regulation, and regulation of 
RNA metabolic processes to be the most significant GO term. The 
list of genes involved in the major biological processes is listed in 
Table 5. 
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It is estimated that approximately 96% of the current genome- 
wide association studies were conducted on individuals of 
European ancestry [37]. There is a growing need to unveil the 
spectrum of human genetic diversity by investigating minority 
populations, for instance the aboriginal populations in Southeast 
Asia (SEA) countries. The Negrito populations from Peninsular 
Malaysia are of interest, as they are known to be the descendants 
of earliest migrants to Southeast Asia. Due to their relatively long 
period of isolation and semi-nomadic lifestyles, they have had less 
exposure to urbanization. Their genomes are therefore perceived 
to be considerably less diverse owing to genetic drift and possibly 
founder effects. This makes them ideal for investigating genetic 
forces acting in human evolution, which provides fundamental 
knowledge to inform disease-based genetic studies as well as gene 
mapping. 

In this study, we identified 263 CNVRs in 34 Negrito subjects 
from Peninsular Malaysia, of which 2 7 we believe are novel and 
unique to Negritos. After excluding the small CNVs, an average 
23 CNVs was observed per Negrito genome. It was found to have 
more losses (72.6%) than gains, in line with most reported studies 
[12,29,32]. Overall size of the CNV observed also corresponded 
well. Approximately 58% of the CNVs found in Negrito were <30 
kb, in line with reports by Yim et al. [27] on the Korean genomes 
and Ku et al. [32]; but was relatively higher than the reported by 
Zhang et al. [30] and McElroy et al. [33]. The average number of 
CNVs detected in the HapMap3 dataset (average CNV call per 
genome = 102.2) (data not shown) and the Chinese populations 
(average CNV call per genome = 140.9) [29] were much higher. 
The number of novel CNVRs identified in Negrito was also lower 
(0.85 per genome) than those previously reported [29-30,32-33], 
This is expected as we have excluded all the small CNVs < 1 0 kb 
from our analyses in this study (comprised ~30.8% of the total 
CNVs identified). Moreover, more populations being genotyped, 
the CNV map gets more saturated consequently hence less novel 
variants are observed. Collectively we observed less CNVRs but 
more alleles (CNVs) in the Negrito genomes. Though in general, 
the CNV profile of Negrito genomes looks similar to those 
reported especially by Ku et al. [32] in three other SEA 
populations except for the X-chromosome which was not 
considered in our study. 

The variation of the number of CNVs detected could be 
attributed to several reasons: i) the technology applied for CNV 
detection and its resolution; ii) levels of stringency applied when 
performing the CNV call; iii) the algorithms applied when 
performing CNV call; iv) we excluded the X-chromosome, 
telomeric and centromeic CNVs. The application of three 
independent CNV algorithms would minimize the false positive 
result rates, as evidenced by Pinto et al. [38] . The poor validation 
rate for the small CNVs (<10 kb) could be attributed to several 
reasons: (i) poor signal to noise ratio of the samples thus leading to 
false positive calls by the algorithms; and (ii) inaccurate estimation 
of breakpoints for the small CNVs due to the limitation of the 
probe density, thus leading to inaccuracy when identifying a 
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Figure 5. Gene Ontology and pathway analyses on the gene set within the Negrito-specific CNVs using PANTHER and DAVID, (a) 

PANTHER analysis suggests a major involvement of the genes harboring the population specific CNVs in the immune system process and response to 
stimulus, as well as the metabolic process; (b) DAVID analysis suggests the involvement of the genes harboring the population specific CNVs in the 
transcription and regulation of RNA metabolic processes. 
doi:1 0.1 371 /journal. pone.01 00371 .g005 



Table 5. Pathways and biological processes of the genes underlying the population specific CNvs in Negrito from Peninsular 
Malysia. 



Pathways/Biological functions 


GO Term 


Genes 


Immune systems and processes 


GO:0002376 


TNFRSF8, CSMD1, SH2D4B, TNFRSF1B, LRRC30 


Response to stimulus 


GO:0050896 


TNFRSF8, CSMD1, SH2D4B, TNFRSF1B 


System process 


GO:0003008 


SCHIP1, LAMA1, GRID2, PALLD 


Metabolic processes 


GO:0008152 


NDUFV3, WDR4, NPAS3, ADH7, PKNOX1 


Cellular processes 


GO:0009987 


TNFRSF8, TNFSR1B, LAMA1, GRID2 


Cell communication 


GO:0007154 


TNFRSF8, TNFSR1B, LAMA1, GRID2 


Transcription 


GO:0006350 


NPAS3, NR3C2, ZNF343 


Regulation of transcription, DNA-dependent 


GO:0006355 


PBX1, NPAS3, NR3C2, ZNF343 


Regulation of RNA metabolic process 


GO:0051252 


PBX1, NPAS3, NR3C2, ZNF343 



Analysis was performed using PANTHER DAVID. 
doi:1 0.1 371 /joumal.pone.01 00371 .t005 
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precise CNV during qPCR validation. Therefore precautious 
should be taken when analysing the small CNVs. Collectively, our 
approach would increase the confidence of higher quality calls, at 
an expense of fewer positives being called. Although the number of 
CNVs was relatively lower then the previous studies, this report is 
considerably more stringent and with a higher confidence level. 
We believe more novel CNVRs unique to Negrito could be 
identified if larger sample sizes were to be investigated. 

Interestingly, the CNVRs enriched with genes showed a 
significandy higher copy number gains. In addition to that, these 
genes were known to be involved in immunity and response to 
stimuli, as well as metabolic pathways. We speculate that the 
Negrito may have undergone processes of local adaptation and 
positive selection, which necessitated their expansions and 
eventual settiement in forest habitats. This hypothesis is supported 
by several previously reported studies [24,30,39]. However, 
possibilities of other processes such genetic drift due to random 
duplications or deletions should not be ruled out [40]. Neverthe- 
less, further investigations should be carried out to confirm the 
findings. 

The health of Negritos has not been studied comprehensively 
for several decades and there are few recent publications [9], 
However, early studies indicated that Negritos were under various 
medical stresses especially with high prevalence of communicable 
diseases including malaria, tuberculosis, leptospirosis and various 
intestinal infections [41]. This could be attributed to their life style 
in the early days, whereby the hunting-gathering activities were 
practiced hence are exposed to a variety of transmissible diseases. 
Malnutrition has been reported to be common amongst aboriginal 
communities, especially women [42]. Although there are no 
specific reports on the nutritional status on Negritos of late, our 
observations and direct communications with the Negrito tribes 
lead us to believe that the majority is undernourished. Although 
we cannot provide unequivocal evidence, it is conceivable that 
their biomedical stresses experienced in over the years resulted in 
the enrichment of selected genes in these Negrito specific CNVs. 

This is the first study of genome-wide CNVs in the Negrito 
population from Peninsular Malaysia. We identified putative novel 
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